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on the Way to 100: Biodemographic Models and Methods 
in Genetic Analyses of Longitudinal Data 


Anatoliy I. Yashin, Konstantin G. Arbeev, Deqing Wu, Liubov Arbeeva, 
Alexander Kulminski, Irina Kulminskaya, Igor Akushevich, 

and Svetlana V. Ukraintseva 

Center for Population Health and Aging, Duke University, Durham, North Carolina 


In this article we clarify mechanisms of genetic regulation of human aging and longevity traits. The objective of this article is to 
address the issues in previous research of not reaching a genome-wide level of statistical significance and lack of replication in the studies 
of independent populations. We performed GWAS of human life span using different subsets of data from the original Framingham 
Heart Study cohort corresponding to different quality control procedures, and we used one subset of selected genetic variants for further 
analyses. We used a simulation study to show that this approach to combining data improves the quality of GWAS with FHS longitudinal 
data to compare average age trajectories of physiological variables in carriers and noncarriers of selected genetic variants. We used a 
stochastic process model of human mortality and aging to investigate genetic influence on hidden biomarkers of aging and on dynamic 
interaction between aging and longevity. We investigated properties of genes related to selected variants and their roles in signaling and 
metabolic pathways and showed that the use of different quality control procedures results in different sets of genetic variants associated 
with life span. We selected 24 genetic variants negatively associated with life span and showed that the joint analyses of genetic data at the 
time of biospecimen collection and follow-up data substantially improved significance of associations of 24 selected SNPs with life span. 
We also showed that aging-related changes in physiological variables and in hidden biomarkers of aging differ for the groups of carriers 
and noncarriers of selected variants. The results of these analyses demonstrated benefits of using biodemographic models and methods 
in genetic association studies of these traits. Our findings showed that the absence of a large number of genetic variants with deleterious 
effects may make substantial contribution to exceptional longevity. These effects are dynamically mediated by a number of physiological 
variables and hidden biomarkers of aging. The results of these research demonstrated benefits of using integrative statistical models of 
mortality risks in genetic studies of human aging and longevity. 


1. INTRODUCTION 

Actuaries and demographers often analyze mortality data using survival models in which mortality rates are considered as 
parametric functions of age. Although such models are useful for many practical applications, there is a clear understanding that 
the explanatory power and predictive potentials of such models are limited, because parameters of these models do not characterize 
factors and processes involved in shaping age patterns of mortality curves, including genes and aging-related changes, as well 
as environmental factors and processes. These models were constructed to provide a good fit to mortality data, with little or no 
concern about biological interpretation of model parameters. The modern tendency toward developing personalized approaches 
to prevention and treatment of chronic aging-related diseases stimulates deeper insights into the nature of individual aging and its 
connection with health and survival outcomes. It is clear that the actuarial practice today has to deal with increasing knowledge 
about factors and mechanisms affecting morbidity and mortality risks and adjust its methods and models accordingly. 

Although the details of such adjustment are not clear to date and the discussion of possible alternatives is beyond the scope 
of this actuarial, actuarial science will benefit from information on forces and mechanisms involved in shaping age patterns of 
mortality curves. It will certainly benefit from summarizing this information in the integrative mortality models whose parameters 
characterize biological processes developing in aging human bodies and their interaction with external conditions, as well as roles 
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of genetic and nongenetic factors affecting mortality risks. Such models will evolve in parallel with accumulation of knowledge 
about individual aging-related changes and their connections with health and survival outcomes. The rapid growth of genetic 
data on participants of large longitudinal studies, as well as earlier estimates that about 25-30% of variation in life span is due 
to genetic factors, opens a unique opportunity to clarify the roles of genetic and nongenetic factors in aging, life span, and 
mortality risk. As in any complicated enterprise, the new findings from studies addressing these issues were accompanied by new 
challenges and problems. One set of problems deals with special feature of genetic variants affecting aging- and longevity-related 
traits, which is often ignored in genetic association studies. These variants are involved in the process of mortality selection in 
genetically heterogeneous populations that causes changes in genetic structure of the population under study and affects the results 
of genome-wide association studies (GWAS) of aging and longevity traits. This indicates that to investigate genetics of human 
longevity-related traits the methods and approaches developed in genetic epidemiology for studying complex multifactorial traits 
have to be properly adjusted to the effects of mortality selection. 

The purpose of this article is to outline complications and problems accompanying genetic analyses of aging and longevity to 
date, to identify hidden reserves and underused research potentials of available data and methods, to discuss novel approaches for 
efficient analyses of available data, and to outline roles of related genes in aging and longevity. We also show that the use of these 
hidden reserves allows for addressing fundamental research problems on mechanisms of genetic regulation of aging and life span 
that have not been addressed before with these data. 

We start with the description of genetic and longitudinal data collected in Framingham Heart Study (FHS) used in our analyses. 
The individuals from the original FHS cohort were followed for more than 60 years with biennial examinations of physiological 
state and measurements of other biomarkers. This cohort is especially convenient for studying aging and longevity because almost 
all study participants from this cohort have data on life span. For historical reasons, not all members of this cohort were genotyped. 
The biospecimens (e.g., blood) for genotyped individuals were collected later and for many of them at different time points. 
For genetic analyses of longitudinal data the time of biospecimen collection became a new “baseline.” The age distribution of 
genotyped individuals at this new “baseline” may contain important information for genetic analyses of human aging and life span 
if this distribution has sufficient numbers of the adults and old and oldest old individuals. Note that this information has not been 
used in the genetic analyses of these traits in (Deelen et al. 2011; Lunetta et al. 2007; Nebel et al. 2011; Newman et al. 2010; Walter 
et al. 2011). These studies demonstrated low efficiency of statistical methods used in genetic association studies, weak associations 
of detected genetic variants with the traits of interest, and lack of replication of research findings in the studies of independent 
populations. These analyses detected many genetic variants associated with longevity. However, most of these variants have not 
reached a genome-wide level of statistical significance. 

Several factors are likely to be responsible for these problems. One may deal with the fact that models currently used in GWAS 
of human aging and longevity are oversimplified and do not correspond to inherent complexity of aging and longevity traits. 
Specifically the traditional approaches to GWAS of these traits underestimate the possibility of pleiotropic associations of genetic 
variants with these traits: Genes may show different associations with mortality and life span at different age intervals (Atzmon 
et al. 2006; Yashin et al. 1999a; Yashin et al. 2000), and they may show pleiotropic effects on risks of distinct diseases (Yashin 
et al. 2015). These effects may also include tradeoffs: Genes may increase risk of one disease and reduce it for the next (Kulminski 
et al. 2013; Ukraintseva et al. 2010). 

One more reason for slow progress in detecting single genes involved in regulation of aging and longevity traits could be 
related to the multifactorial nature of such traits. Traits might be influenced by large numbers of common genetic variants, 
each having a small effect (Burton et al. 2007; Thomas et al. 2008; Yang et al. 2010; Zeggini et al. 2008). This property was 
considered a “disappointing feature of many discoveries made in GWAS, because such genetic variants may be of little predictive 
value” (Pawitan et al. 2009). The situation indicates the need for approaches capable of evaluating mechanisms by which many 
small-effect alleles may influence a complex trait of interest. In Yashin et al. (2010c) we investigated one such mechanism for 
genetic regulation of longevity. We hypothesized that the value of the trait (life span) depends on the number of the small-effect 
“longevity” alleles contained in the individual genome. The results of our analyses strongly supported this hypothesis and showed 
that joint influence of many small-effect longevity alleles on life span can be described as the “genetic dose—life span response” 
relationship. The existence of such a relationship brings a new perspective to GWAS of longevity and other complex traits. The 
construction of different versions of polygenic risks scores and properties of additive polygenic influence on life span has been 
discussed in Yashin et al. (2012a) and Yashin et al. (2012b). 

The lack of replication—a serious problem for many GWASs of human aging and longevity traits—may partly deal with 
technical issues. Different research groups may detect different sets of genetic variants because they use different statistical models 
in genetic association studies of these traits. We discuss this issue and provide a reference to our article that addressed this problem 
in detail (Yashin et al. 2012b). We also show that one more reason for the lack of replication of GWAS findings could be differences 
in quality control (QC) procedures used in preparing genetic data for analyses. It turns out that different research groups often use 
different QC procedures that may reduce comparability of research findings. 
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Most researchers studying the genetics of human longevity searched for genes that contribute to increased longevity. This strategy 
ignores the possibility that at least some part of exceptional longevity could result from the absence of many deleterious genetic 
factors that contribute to premature death, especially for individuals having large numbers of such “frailty” or “vulnerability” 
alleles or genotypes. The presence of many such “vulnerability” alleles has been hypothesized in the evolutionary theories of aging 
(Albin 1993; Charlesworth 2001; Hughes and Reynolds 2005; Ljubuncic and Reznick 2009). A number of detected variants show 
pleiotropic effects on mortality risk at distinct age intervals of the life course and different disease traits, as well as on dynamic 
characteristics of aging-related changes taking place during the life course. 

One more reason for low efficiency of genetic association studies of aging and longevity traits might be that conventional 
methods used in genetic association studies underutilize available information on aging, health, and life span. Most of these 
methods just ignore the available knowledge about the traits of interest accumulated in the research field and treat the limited data 
set used in the analyses as the only source of information about these traits. This practice misses the opportunities to realize high 
research potential of the available information (Arbeev et al. 2011; Yashin et al. 2007a; Yashin et al. 2013a; Yashin et al. 1999a; 
Yashin et al. 2000). It jeopardizes systemic integration of available information on aging and longevity, reduces the efficiency of 
data analyses, and slows the progress in better understanding of either the nature of these traits or factors affecting them. 

The power and biological relevance of GWAS can be enhanced by incorporating the biological and demographic principles of 
trait formation into the backbone of genetic analyses, through appropriate mathematical models and statistical functions, and by 
incorporating genetic questions into a comprehensive framework of dynamic analyses of longitudinal data. We discuss approaches 
to improving efficiency of genetic association studies and explain how combining genetic data with demographic information can 
be used for these purposes. We also show that combining genetic information from the data on age distribution at the time of 
biospecimen collection with that of follow-up data improves the accuracy of genetic analyses. 

An important advantage of using longitudinal data in genetic analyses of aging and longevity traits is the opportunity to 
evaluate survival functions and age trajectories of other biomarkers for carriers and noncarriers of selected genetic variants. The 
behavior of such survival functions may indicate how the effects of selected genetic variants on mortality risk change with age. 
The difference in average age trajectories of physiological variables for carriers and noncarriers of selected genetic variants will 
show how physiological aging-related changes are modulated by genetic factors. Note that such a possibility does not exist in the 
case-control studies. We illustrate this advantage by showing corresponding age trajectories for carriers and noncarriers of the two 
selected genetic variants. 

From the systems biology point of view the biomarkers measured in longitudinal studies represent only a portion of components 
of a multidimensional biological process representing coordinated process of aging-related decline in organism’s functioning. 
Many important biomarkers of aging are not measured in longitudinal studies but have a substantial influence on age behavior 
of measured variables. The roles of such biomarkers in the mechanisms of aging-related changes were verified in other studies 
of aging including experiments with animal model systems. To be able to evaluate effects of measured and hidden components 
of aging-related changes on life span in their mutual connection, as well as to evaluate roles of genetic and nongenetic factors 
in these processes, we developed the genetic stochastic process model of human aging and mortality that includes hidden and 
observed components of these changes. We show how hidden biomarkers characterizing stress resistance, adaptive capacity, 
physiological norms, effects of allostatic adaptation, and allostatic load can be incorporated into a dynamic stochastic process 
model (SPM) of human aging, health, and longevity and how this model can be used in statistical analyses of genetic, static 
nongenetic, and phenotypic longitudinal data. The prototype of this model was first described in Woodbury and Manton (1977). 
Its application to analyses of longitudinal data is described in numerous publications (see Yashin and Manton 1997 and references 
therein). 

The SPM for analyzing hidden components of aging has been developed and validated in the studies of subsets of the longitudinal 
data (Arbeev et al. 2009; Arbeev et al. 2011; Arbeev et al. 2012; Yashin et al. 2011a,b,c; Yashin et al. 2007a; Yashin et al. 2012c; 
Yashin et al. 2010a; Yashin et al. 2007b; Yashin et al. 2008; Yashin et al. 2012d; Yashin et al. 2013a; Yashin et al. 2009). The 
use of the genetic version of such a model (GenSPM) (Arbeev et al. 2009) allows us to synthesize all of the components and the 
outcomes and to evaluate how genetic effects on aging and longevity traits are mediated by physiological variables and the key 
biomarkers of aging. Note that an important advantage of using the genetic version of the stochastic process model of human aging, 
health, and mortality is the opportunity to study roles of genetic factors in hidden biomarkers of aging and their connection with 
health and survival outcomes. These biomarkers (stress resistance, adaptive capacity, age dependent physiological norms, effect 
of allostatic adaptation, allostatic load) are considered in the model as a part of the biological mechanism involved in forming 
partly observed age trajectories of physiological variables as well as risks of health and survival outcomes. Such analyses allow 
for testing whether age trajectories of biomarkers of aging depend on the individual’s genetic background and whether parameters 
describing age patterns of mortality rates as well as other hazard rates differ for individuals with different genetic background. 
Finally, we investigate properties of detected genes and their biological roles in regulating aging and life span. 
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2. LONGITUDINAL DATA IN GENETIC STUDIES OF AGING AND LONGEVITY: THE FRAMINGHAM HEART STUDY 

The Framingham Heart Study (FHS) includes 14,428 participants, from whom 9,215 were genotyped for 550,000 SNPs. 
The FHS original cohort was launched in 1948 (exam 1), with 5,209 respondents (55 percent females) age 28-62 residing in 
Framingham, Massachusetts, who had not yet developed overt symptoms of cardiovascular disease (Dawber et al. 1951) and 
continued to the present with biennial examinations (30 exams to date) that include detailed medical history, physical exams, and 
laboratory tests. The offspring cohort (FHSO) was launched in 1971 (with eight exams to date) with 5,124 second-generation 
individuals (52 percent females), who are the original FHS participants’ adult children and their spouses (Kannel et al. 1979; 
Splansky et al. 2007). The third generation cohort, consisting of the grandchildren of the original cohort participants, having at 
least one parent in the offspring cohort, totaling 4,095 individuals (53 percent females), was added to the study with the first 
examination completed in 2005 (Splansky et al. 2007). The three FHS cohorts use similar research protocols so comparisons could 
be made. Across the three generations, 99.7 percent of participants are white. 

Phenotypic traits collected in the FHS cohorts over 60 years and relevant to our analyses include life span, cause of death, age 
at disease onset (cardiovascular diseases [CVD], cancer, and neurodegenerative disorders [NDs]), indices characterizing disease 
and recovery progress (blood, urinary, mental, and physical tests; use of medication and other treatment), internal and external 
disease risk factors, including diastolic blood pressure (DBP), systolic blood pressure (SBP), ventricular rate (VR), blood glucose 
(BG), serum cholesterol (CH), body mass index (BMI), and demographic, behavioral, and life history characteristics and selected 
markers of aging. The occurrence of CVD, cancer, NDs, and death has been followed through continuous surveillance of hospital 
admissions, death registries, clinical exams, and other sources, so that all the respective events are included in the study. 

FHS genetic data include 9,215 individuals from three generations of the FHS who were genotyped for genome-wide SNPs, with 
results available through the Framingham SNP Health Association Resource (SHARe). The genotyping was conducted using the 
Affymetrix platform with about 550,000 SNPs representing a significant part of human genome variability. Individual information 
is publicly available through the Framingham SHARe upon request. In this article we will discuss challenging issues of genetic 
association studies of human aging and longevity, describe approaches that improve quality of genetic analyses, and illustrate the 
use of these approaches in the analyses of data from the original FHS cohort. 


3. COMPREHENSIVE GENETIC ANALYSES OF HUMAN LIFE SPAN 

Recently, studying the details of genetic connection between individual aging-related changes and mortality rate was considered 
a matter of high practical importance in gerontological literature. This is because the hypothesis that one can reduce burden of 
chronic aging-related diseases by postponing individual aging processes or by slowing the individual aging rates received support in 
experimental studies of aging with laboratory animals. However, despite the high potential of available data and evident progress 
in clarifying the genetic nature of many complex traits, the genetic studies of human aging and longevity traits had limited 
success. The decades of genetic studies of longevity using candidate genes showed that new conceptual ideas are needed to better 
understand genetic mechanisms involved in regulating aging-related traits (De Benedictis et al. 2001; Finch and Tanzi 1997). The 
expectations that the use of GWAS will rapidly clarify these problems have not been realized. The results of these studies were 
often controversial. Most associations have not reached genome-wide levels of statistical significance and suffered from the lack 
of replication. The research findings were difficult to explain from the evolutionary theory point of view. This situation indicates 
the need for developing new concepts and better methods for analyzing genetic data on such traits (Di Rienzo and Hudson 2005; 
Teslovich et al. 2010; Vijg and Suh 2005; Yashin et al. 2012b). 

In explaining slow progress in genetic analyses of aging and longevity traits, the presence of pleiotropic and age-dependent 
genetic associations plays important roles (Kulminski et al. 2010; Summers and Crespi 2010; Williams and Day 2003; Yashin 
et al. 1999; Yashin et al. 2000; Yashin et al. 2001). It turns out that the influence of the APOE polymorphism on health- and 
longevity-related traits shows such effects. 


3.1. Effects of APOE Alleles on Life Span and Aging-Related Diseases 

From the FHS and FHSO generations, 5,182 individuals have information on the apolipoprotein E (APOE) e2/3/4 polymorphism. 
The association of APOE alleles and genotypes with longevity is confirmed in a number of studies; however, the pleiotropic 
properties of APOE polymorphism are not well known. In Kulminski et al. (2013) we show that the APOE e4 allele can play 
detrimental, neutral, and protective sex-specific roles in the etiology of CVD at different ages and in different environments. 
Specifically the role of the e4 allele in onset of CVD is age- and generation-specific, constituting two modes of sexually dimorphic 
genetic trade-offs. In offspring, the e4 allele confers risk of CVD primarily in women and can protect against cancer primarily 
in men of the same age. In the parental generation, a genetic trade-off is seen in different age groups, with a protective role of 
the e4 allele against cancer in older men and its detrimental role in CVD in younger women. The aging-related processes can 
modulate the strength of genetic associations with total cholesterol (TC) in the same individuals at different chronological ages. 
Substantial differences in the effects of the same APOE allele on CVD and TC was also observed across generations (Kulminski 
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FIGURE |. Effects of Sample Call Rate in GWAS. Source: Framingham Heart Study (a limited access dataset), original cohort. 


et al. 2013). These results suggest that aging-related processes and changing environments may modulate genetic effects on health 
span. The analyses also show that the e4 allele carriers live longer without cancer than the noncarriers of this allele in each 
generation (Kulminski et al. 2013). The association of the APOE e4 allele with life span in two generations of participants of 
large longitudinal studies, the Framingham Heart Study and the Long Life Family Study, showed that women’s life span is more 
sensitive to the e4 allele than men’s in these populations (Kulminski et al. 2014). The strongly adverse effect of the e4 allele is 
observed for women between 70 and 95 years of age. Cardiovascular disease, cancer, and neurodegenerative disorders did not 
mediate the association of the e4 allele with life span. However, cancer nonadditively enhanced this effect, resulting in 4.2 years 
of difference in mean life span for the e4 allele carriers compared to the noncarriers. These properties of genetic mechanisms 
manifested in different genders, ages, and environments call for more details and systemic analyses beyond those used in current 
large-scale genetic association studies. 


3.2. The Influence of Quality Control (QC) Procedures Used in GWAS on the Results of Genetic Analyses 

It is important to note that a substantial part of the original FHS cohort has information on life spans of the study participants. 
To be able to use a mixed effects model in the analyses of data on life span we have to make imputation of censored data. For this 
purpose, we estimated residual life spans for individuals censored at a given age by calculating average life span of deceased study 
participants who survived up to this age. In the analyses of genetic connection with life span we imputed the data for male and 
female members of the original FHS cohort with censored life spans by adding average residual life span to the age at censoring. 
Then we prepared several datasets for genetic analyses using different QC procedures. The sample call rates, SNP call rates, 
minimal value of minor allele frequencies (MAF), and minimal p value to control for Hardy-Weinberg equilibrium (HWE) are 
specified in Figure 1. One can see from this diagram that the use of different sample call rates results in different numbers of study 
subjects participated in genetic analyses. 
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Using these data and the mixed model realized in EMMAX software (Kang et al. 2010), we performed GWAS of human life 
span controlling for smoking (ever or never) and birth cohort. The analyses resulted in several sets of genetic variants (Fig. 1) 
having genome-wide significant negative associations with life span. Note that although the numbers of variants detected in these 
analyses differ for different datasets, a substantial portion of variants remains common for all analyses. This might be because 
all these analyses are performed with subgroups of the same population (genotyped individuals from the original FHS cohort). In 
case of independent populations the intersections among selected sets of variants could be empty because of differences in genetic 
backgrounds and the histories of exposure to external conditions. This explains why it might be difficult to confirm research 
findings by comparing them with those obtained by different research groups. The use of different QC procedures in genetic 
association studies of aging and longevity traits by different research groups could contribute to the nonreplication of the research 
results. These results also cast doubt to the accuracy of the results of meta-analyses dealing with findings from studies that used 
different QC procedures. 

One can conclude from this diagram that parameters in call rates influence the sample size as well as the number of detected 
genetic variants. In cases where the sample size of the data is limited, the stringent control for sample call rate may substantially 
reduce the number of people in the study. The difference in results of analyses indicates that differences in parameters of the QC 
procedures may contribute to the lack of replication using independent populations if these studies used different QC procedures. 
Thus, one approach that might contribute to resolving situation with lack of replication is to use similar QC procedures in different 
analyses. Next we look at other methods and approaches that can improve the quality of genetic analyses. 


3.3. How the Quality of Genetic Analyses Can Be Improved 

The statistical methods used in genetic association studies of human aging and longevity are simple and convenient for 
performing multiple calculations relatively fast. This property makes such methods attractive for genome-wide association studies 
with large amounts of genetic information. However, statistical models linking genetic factors with phenotypes of interest look 
oversimplified from the biological point of view. The genetic analyses could benefit from using more sophisticated models adjusted 
to biological complexity of aging and longevity traits. The traditional approaches to genetic analyses treat each new dataset as if it 
is the only dataset available. Either the findings from the earlier genetic studies of human longevity or other relevant information 
available in the research field that could make genetic analyses more efficient are ignored. The integration of the wealth of 
available knowledge about aging and longevity and its use in the analyses of data are needed. Such integration can be done, for 
example, by including informative data that have not been used before in the analyses. The access to longitudinal data on aging 
and longevity opens additional opportunities for efficient genetic analyses. Combining genetic data from the age distribution at 
baseline (at the time of biospecimen collection) with those from the follow-up studies in joint analyses improves the quality of 
genetic estimates. The advanced statistical and computer models of aging-related changes and life span are capable of incorporating 
appropriate information on hidden biomarkers and mechanisms of aging changes in the model’s structure. Such incorporation 
requires understanding certain biological regularities and connections and has to be performed by the interdisciplinary research 
group. The use of these approaches allows for substantial improvement of the quality of genetic estimates. It also allows for 
addressing new fundamental research questions about the nature of aging-related changes in humans. 


3.3.1. Combining Demographic and Genetic Data Improve the Accuracy of Genetic Analyses 

The critical analysis of conducted genetic association studies of human aging and longevity allowed for detecting the underuti- 
lized reserves in the data that can be used to improve the accuracy of such associations with human longevity. One such reserve 
is the data on all-cause mortality. The benefits of using such data are based on the fact that populations, from which the samples 
of study subjects were taken, are genetically heterogeneous groups of individuals. The age trajectories of mortality rates in such 
populations result from selection process that took place in genetically heterogeneous cohorts. Each such cohort can be represented 
as a mixture of carriers and noncarriers of selected genetic variant. The mortality rate at a given age in such population is a weighted 
sum of mortality rates in subpopulations of carriers and noncatriers where the weights coincide with proportions of carriers and 
noncarriers in the cohort at this age. There is a one-to-one correspondence between the age trajectories of these proportions and 
mortality rates for the groups of carriers and noncarriers. This correspondence has a simple mathematical description that can be 
used in joint analyses of demographic and genetic data. Such analyses improve the accuracy of genetic estimates. The information 
for genetic analyses consists of demographic data for the nongenotyped part of the population and genetic data for the genotyped 
part of this population. The demographic information is presented by data on total mortality. 

The population of nongenotyped individuals is considered as a mixture of carriers and noncarriers of selected genetic variant. 
The genetic data for genotyped individuals are represented by the numbers of carriers and noncarriers of selected genetic variant 
at different ages. The carriers and noncarriers of this variant have mortality rates j11(x) andjzo(x), respectively. The total mortality 
(A(x) is represented as weighted sum of the two mortality rates , where z;(x)is the proportion of carriers at age x. The null 
hypothesis is that the two mortality rates are the same. This hypothesis can be tested in genetic association study by maximizing 
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FIGURE 2. Distribution of Age at Time of Biospecimen Collection. Source: Framingham Heart Study (a limited access dataset), original cohort. 


the joint likelihood of demographic and genetic data and using the likelihood-ratio test (Yashin et al. 1999a; Yashin et al. 2000; 
Yashin et al. 2007d). 


3.3.2. The Age Structure of the Population under Study at the Time of Biospecimen Collection 

The age structure of the group of genotyped study subjects at the time of biospecimen collection provides one with additional 
information about the genetics of life span. This information can also be used for improving the accuracy of genetic analyses. The 
idea is based on the fact that participants of prospective studies usually have different ages at the time of biospecimen collection 
(see Fig. 2). To illustrate the benefits of the approach for joint analyses of the follow-up data and the data on ages at the time 
of biospecimen collection compared to the analyses of the follow-up data alone, we performed a simulation study assuming that 
carriers and noncarriers of some hypothetical allele in a population have the Cox-type mortality rates w(x|G) = juo(x)e”" where 
G = 0 for noncarriers and G = | for carriers, and the baseline mortality is the Gompertz function, In wo(x) = Ina + bx. In 
simulation, we used In a = —9.0 and b = 0.08 to produce reasonable survival patterns corresponding to human populations, and 
the proportion of carriers at birth, pp = 0.25. The parameter y varied from —0.5 to 0.5 with the interval 0.1 to simulate scenarios 
with different effect sizes. For each set of model parameters defined above, we generated life spans of 4,500 individuals from 
the respective probability distributions: those corresponding to the hazard fzo(x)e” for carriers and j19(x) for noncarriers. Then 
we assigned the hypothetical “ages at entry” into the study, which is also considered ages at the time of biospecimen collection, 
uniformly distributed over the interval between 40 and 100 years. Individuals with simulated life spans exceeding “age at entry” 
plus six years were considered censored at the “age at entry” plus six. Such a design resembles the Long Life Family Study (Yashin 
et al. 2010b). This procedure was repeated 1,000 times to generate 1,000 datasets (in each scenario with respective y). We then 
estimated these data using the parts of the likelihood functions from Arbeev et al. (2011) containing (1) only follow-up information 
and (2) follow-up information and information on ages at biospecimen collection. We calculated the power, that is, the proportion 
of datasets for which the null hypothesis y = O was rejected at the 0.05 level, in these two methods for different effect sizes, that 
is, the values of the regression parameter y. The results are shown in Table 1. 

The results shown in Table | indicate that the use of information on ages at biospecimen collection in addition to the follow-up 
data gives a substantial increase in power compared to the traditional approach that uses the follow-up data only (note that these 
analyses can also be implemented to the data on ages at disease onset instead on life span data). The effect, however, depends on 
duration of the follow-up period. In the case of a longer follow-up period, the relative contribution of the data on ages at the time 
of biospecimen collection to the improvement of the accuracy of statistical estimates will be smaller. Conversely, in the case of 
a shorter follow-up period, the data on ages at biospecimen collection play a more important role in differentiating the allele- or 
genotype-specific survival patterns from the data. The results of these analyses show that the approach described above may have 
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TABLE 1 
Power in Simulation Studies 
y RR Follow-up only Follow-up and ages 
—0.5 0.607 1.000 1.000 
—0.4 0.670 0.997 1.000 
—0.3 0.741 0.940 1.000 
—0.2 0.819 0.663 0.938 
—0.1 0.905 0.198 0.435 
0.0 1.000 0.058 0.043 
0.1 1.105 0.223 0.464 
0.2 1.221 0.666 0.958 
0.3 1.350 0.953 1.000 
0.4 1.492 0.996 1.000 
0.5 1.649 1.000 1.000 


Note: Power in simulation studies is illustrated for two methods: (1) only follow-up information and (2) follow-up information and information 
on ages at biospecimen collection. 


important implications for GWA studies of human aging, health, and longevity, especially in cases with short follow-up periods 
(Yashin et al. 2013b). 

The application of this approach to the analyses of data on 24 vulnerability alleles (corresponding to one of the QC procedures 
in our GWAS of longevity in FHS; see next section) resulted in a substantial increase in the significance of detected p values. 
Application of the method that, in addition to follow-up data on genotyped and ages at biospecimen collection, utilizes follow-up 
data on nongenotyped individuals (Arbeev et al. 2011) results in an additional increase in the accuracy and power of estimates in 
such joint analyses compared to analyses based on genetic subsample alone. Application of this method to analysis of the effect 
of common APOE polymorphism on survival using combined genetic and nongenetic subsamples of the FHS original cohort 
data showed an important result that female, but not male, carriers of the APOE e4 allele have significantly worse survival than 
noncarriers, whereas empirical analyses did not attain significant results for either sex (Arbeev et al. 2011). 


3.4. Twenty-four Vulnerability Alleles 

Let us investigate properties of 24 genetic variants that showed associations with life span in both male and female study 
participants of the original FHS cohort in GWAS characterized by the QC procedure with sample call rate >90%, SNP call rate 
>90%, HWE >10~* and MAF >5% (see Fig. 1). Note that all these variants have negative associations with life span, so they will 
be called “frailty” or “vulnerability” alleles. 

Table 2 illustrates properties of 24 selected variants (SNPs). The first column shows the SNP rs-number, the second the 
chromosome number; the third and the fourth the number of minor alleles and the total number of alleles for the corresponding 
SNP, respectively; and the fifth, sixth, and seventh columns show the SNPs’ minor allele frequencies in our study, HapMap, and 
“1000 genome” databases. 

One can see that chromosomes 2, 3, 4, 6, 7, 9, 13, 15, 18, and 21 are not represented by the selected SNPs. Chromosome 
11 is represented by four SNPs: rs1440483, rs1794108, rs5743998, and rs9971555. Chromosome 12 is represented by the two 
SNPs rs1399453 and rs1084509; three SNPs are in chromosome 17; rs2586484, rs8081943, and rs9896996; two others are on 
chromosome 20: rs6090342 and rs1153695; two are on chromosome 5 rs356430 and rs1706760; two are on chromosome 16 
rs9925881 and rs9928967; one SNP is on chromosome 1: rs3738682. Similarly, chromosomes 8, 9, 19, 21, and 22 have only one 
SNP: 182353447, rs4565533, rs2838566, rs6007952, and rs8135777, respectively. 


3.4.1. Sensitivity Analyses 

Tables 3 and 4 show the results of additional genetic association studies that involve 24 detected variants. To test how sensitive 
the results of GWAS are to the life span imputation, we modified the life span data by considering ages at censoring for 203 censored 
individuals as ages at death and performed GWAS of modified data. These analyses resulted in 21 genome-wide significant SNPs 
common for males and females, respectively. Fifteen out of these 21 SNPs also belong to the set of 24 SNPs detected earlier. 
These 15 SNPs are indicated by an asterisk in Tables 3 and 4. The associations of other variants with life span remain nominally 
statistically significant. 

The p values and standard errors calculated by statistical estimation algorithms are based on the assumption on normality of 
data used in the analyses. The empirical distributions of life span for males and females usually deviate from normal. To make 
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TABLE 2 

Properties of 24 Selected SNPs 
rs number Chr No. MA No. A MAF MAF HP MAF 1000 
rs5491 19 224 2128 0.10563 0 0.075 
rs356430 5 218 2020 0.107921 0 0.017 
rs1399453 12 225 2052 0.109649 0 0.024 
rs1440483 11 190 2064 0.092054 0 0.054 
rs1794108 11 159 2104 0.07557 0 0 
rs2353447 8 230 2092 0.109943 0 0.02 
rs2586484 17 236 2074 0.11379 0.008 0.012 
1s2838566 21 252 2098 0.120114 0 0.11 
rs3738682 1 166 2026 0.081935 0.017 0.011 
rs4565533 9 447 2088 0.21408 0.09 0.06 
rs4904670 14 291 2120 0.137264 0 0.03 
rs5743998 11 198 2044 0.096869 0 0.012 
rs6007952 22 356 2058 0.172983 0.05 0.06 
rs6090342 20 266 2060 0.129126 0 0.28 
rs7894051 10 426 2136 0.199438 0.05 0.1 
rs8081943 17 148 2176 0.068015 0 0.03 
rs8135777 22 216 1996 0.108216 0 0.023 
1s9896996 17 209 2082 0.100384 0.035 0.04 
rs9925881 16 144 2068 0.069632 0 0.05 
rs9928967 16 137 2140 0.064019 0 0.03 
rs9971555 11 232 2092 0.110899 0 0.02 
rs10845099 12 380 2072 0.183398 0.093 0.323 
rs11536959 20 155 2132 0.072702 0 0.017 
rs17067605 5 167 2046 0.081623 0 0.004 


Source: Framingham Heart Study (a limited access dataset). 

Note: SNPs selected according to one of the QC procedures whose minor alleles have significant negative associations with life span. The 
columns in the table denote (1) SNP number; (2) chromosome number; (3) number of minor alleles in a sample; (4) the total number of alleles in 
a sample; (5) minor allele frequencies in a sample; (6) minor allele frequencies in HapMap; and (7) minor allele frequencies in the 1,000 Genome 
Project. 


them more “normalized,” we performed a Box-Cox transformation of these distributions. The GWAS of transformed data resulted 
in 43 SNPs negatively associated SNPs for females and 33 such SNPs for males. The intersection of these sets with the set of 24 
SNPs resulted in 12 SNPs. These SNPs are indicated by a hash symbol in Tables 3 and 4. 

To test whether selected genetic variants have significant association with life span without using life span imputation procedure, 
we performed additional genetic analyses by combining the follow-up data on life span in the original FHS cohort with data on 
genetic frequencies from age distribution of the age at blood collection. The method is described in Section 3.4.2. In these analyses 
the mortality rates for carriers and noncarriers of each of 24 preselected alleles were described by the Gompertz curves. The 
parameters of these curves were estimated, and the null hypotheses about similarity of mortality rates for carriers and noncarriers 
of each of 24 genetic variants were tested. The results of these analyses are shown in Tables 3 and 4. They indicate that difference 
in mortality patterns between carriers and noncarriers of corresponding genetic variants is highly statistically significant. 


3.4.2. Polygenic Score Index 

Using 24 selected genetic variants negatively associated with life span we constructed additive polygenic risk score indices. 
For this purpose for each study participant we calculated weighted sums of the numbers of detected genetic variants carried 
by this person using estimated effect sizes as weights. We also constructed a simplified polygenic score index by counting the 
number of vulnerability alleles carried by each genotyped study participant (genetic dose index) (see details in Supplementary 
Materials in Yashin et al. 2012a). Figure 3 shows Kaplan-Meier estimates of conditional survival functions together with 95% 
confidence intervals for the three groups of individuals: (1) carriers of two or fewer vulnerability alleles, (2) carriers of more than 
two vulnerability alleles, and (3) total population for individuals survived to age 80. One can see from this figure that individuals 
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TABLE 3 
Parameter Estimates of Gompertz’s Approximations of Survival Functions for Males 

SNP In_a_l b_l In_a_0 b_0 p value 

rs6007952#,* 5.55 0.045 —8.79 0.078 4.06E—09 
rs9971555#,* —6.12 0.054 —8.73 0.077 9.33E—14 
1s9896996* 9.2) 0.044 —8.57 0.075 2.49E—12 
rs2353447#,* —6.20 0.055 —8.38 0.073 2.12E—12 
rs4904670#,* —6.24 0.056 —9.81 0.088 0.00E+00 
rs6090342#,* —6.13 0.054 —8.74 0.077 1.12E—13 
rs17067605 —5.25 0.044 —8.75 0.078 1.55E—14 
rs 1440483 —7.25 0.070 —8.65 0.076 0.00E-+00 
rs2838566#,* —6.06 0.054 —9.47 0.085 0.00E+00 
rs9925881 —5.65 0.049 —8.21 0.071 2.75E—11 
rs4565533#,* —5.798 0.048 —9.56 0.086 2.66E—13 
1s5 743998 —6.03 0.053 —8.23 0.072 6.10E—11 
183738682 —6.15 0.056 —8.38 0.075 4.63E—14 
rs789405 1#,* —6.00 0.051 —9.76 0.088 1.79E—14 
1s2586484* —6.40 0.057 —8.55 0.075 9.06E— 14 
rs356430#,* —7.10 0.067 —8.26 0.072 2.80E—13 
rs 10845099* —6.30 0.054 —8.84 0.078 7,.22E—12 
rs8135777#,* —5.74 0.050 —8.54 0.075 8.03E—13 
rs11536959 —5.30 0.045 —8.32 0.073 2.54E—12 
rs1399453#,* —5.85 0.051 —6&.80 0.078 1.99E—14 
189928967 —5.12 0.043 —8.45 0.074 1.41E—13 
rsS491#,* —6.37 0.056 —8&.00 0.069 4.18E—09 
rs1794108 —5.50 0.048 —8.55 0.075 3.50E—14 
rs808 1943 —5.74 0.050 —8.54 0.075 8.03E—13 


Source: Framingham Heart Study (a limited access dataset) 

Note: Column “SNP” shows rs number of the SNP. The symbols in this column denote are as follows: #: minor allele of corresponding SNP 
had a genome-wide significant association with Box-Cox transformed life spans; *: minor allele of corresponding SNP showed a genome-wide 
significant association with modified life span data (i.e., with ages at censoring for 203 censored individuals considered as ages at death). 
Columns “In_a_l” and “b_1” show the estimates of parameters of Gompertz mortality curve j4(x) = aoe’ for carriers of minor allele of 
corresponding SNP. Columns “/n_a_0” and “b_0” show the estimates of parameters of Gompertz mortality curve for noncarriers of minor allele 
of corresponding SNP. Column “p value” shows p values for testing the null hypothesis about the equality the survival functions among carriers 
and noncarriers. 


carrying a smaller number of vulnerability alleles have better survival functions than average individuals and those who have a 
larger number of vulnerability alleles. 

In our earlier study, biodemographic analyses of genetic regulation of life span was done using data on longevity alleles 
selected in the GWAS of human life span by applying several different statistical models to the FHS data (Yashin et al. 2012b). 
Age-specific survival curves considered as functions of the number of longevity alleles exhibited regularities known in demography 
as “rectangularization” of survival curves. An important finding was that the presence of such pattern confirms the results from 
theoretical and experimental studies about connection between longevity and stress resistance. Biodemographic analyses can thus 
provide important insights into the properties of genes affecting phenotypic traits (Yashin et al. 2012a). 


3.5. Average Age Trajectories of Physiological Variables in Total Genotyped Sample and in Carriers and Noncarriers of 
Vulnerability Alleles 

The inclusion of longitudinal data in the analyses allows for getting new insights into biological mechanisms that mediate effects 
of various influential factors on life span. In particular, using these data one can evaluate and compare average age trajectories of 
physiological indices and other biomarkers for different groups of study participants. Figure 4 shows average age trajectories of 
physiological indices for males and females. One can see from this figure that average trajectories for BG and PP were about the 
same for males and females, with a slight difference between ages 65 and 80 for BG and for PP before age 45, where female PP 
values were lower, and after age 70, where female PP values were higher. 
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TABLE 4 
Parameter Estimates of Gompertz’s Approximations of Survival Functions for Females 

SNP In-a_l b_l In_a_0 b_0 p value 

rs6007952 #,* —5.91 0.046 —11.28 0.102 2.33E—15 
rs9971555#,* —5.22 0.040 —10.78 0.097 1.44E—15 
1s9896996* —4.48 0.031 —10.85 0.098 1.11E—15 
rs2353447#,* —5.00 0.038 —11.4] 0.103 0.00E+00 
rs49046 70#,* —S5.86 0.048 —11.52 0.104 0.00E+00 
rs6090342#,* —5.62 0.044 —11.08 0.100 0.00E+00 
rs17067605 —4.79 0.036 —10.55 0.094 3.33E—16 
rs1440483 —4.81 0.036 —10.76 0.096 0.00E-+-00 
rs2838566#,* —4.88 0.036 —11.56 0.105 0.00E+00 
rs992588 1 —5.12 0.040 —10.03 0.089 3.22E—12 
rs4565533#,* —5.97 0.047 —12.45 0.114 0.00E+00 
185743998 —4.74 0.034 —10.82 0.097 3.33E—16 
183738682 —4.95 0.038 —10.65 0.095 0.00E+00 
rs789405 1#,* —5.90 0.047 —12.37 0.113 0.00E+00 
1s2586484* —5.19 0.039 —10.81 0.097 3.89E—15 
rs356430#,* —5.45 0.043 —10.61 0.095 7.77E—16 
rs10845099* —5.36 0.039 —11.26 0.102 2.33E—12 
rs8135777#,* —5.61 0.044 —10.56 0.094 6.00E—15 
rs11536959 —5.19 0.040 —10.21 0.090 5.02E—13 
rs1399453#,* —5.55 0.044 —10.61 0.095 1.44E—15 
189928967 —4.21 0.029 —10.37 0.092 1.82E—14 
rs5491#,* —6.34 0.054 —10.56 0.094 0.00E+00 
rs1794108 —4.81 0.036 —10.38 0.092 1.38E—14 
rs808 1943 —4.19 0.029 —10.52 0.094 0.00E+00 


Source: Framingham Heart Study (a limited access dataset). 

Note: Estimates of parameters of Gompertz’s approximations of survival functions for carriers and noncarriers of each of 24 vulnerability 
alleles for genotyped female participants of the original FHS cohort. Column “SNP” shows rs number of the SNP. The symbols in this column 
are #: minor allele of corresponding SNP had a genome-wide significant association with Box-Cox transformed life spans; *: minor allele of 
corresponding SNP showed a genome-wide significant association with modified life span data (i.e., with ages at censoring for 203 censored 
individuals considered as ages at death). Columns “/n_a_]” and “b_1” show the estimates of parameters of the Gompertz mortality curve 
(x) = age? for carriers of minor allele of corresponding SNP. Columns “/n_a_0” and “b_0” show the estimates of parameters of the Gompertz 
mortality curve (x) = aoe’ for noncarriers of minor allele of corresponding SNP. Column “p value” shows p values for testing the null 
hypothesis about the equality the survival functions among carriers and noncarriers. 


The average BMI values for females were lower than that of males until age 75. After this age, the curves practically coincide. 
The values of CH were lower for females until age 45 and then became higher until the end of the observation interval. The values 
of DBP were lower in females until age 75. After this age, the curves became indistinguishable. The SBP was lower for females 
until age 50. Then the males and females curves practically coincide until age 75. After this age, the female SBP curve became 
higher than that for males. The H curve for males is higher than that of females for the entire interval, and the VR curve is higher 
for females for the entire age interval. It is clear that the observed difference between males and females is partly of genetic origin. 
It is also likely that males and females experienced different exposure to external conditions. 

Using data on detected genetic variants associated with life span one can evaluate and compare age trajectories of physiological 
indices for groups of study subjects having different genetic backgrounds. Figures 5 and 6 show average age trajectories of 
physiological variables for male carriers and noncarriers of minor alleles of rs5491 and rs9925881 SNPs from the original FHS 
cohort. 

These two alleles were selected to illustrate the difference in their associations with age trajectories of physiological variables, 
therefore trajectories for female carriers and noncarriers are not shown. One can see from these figures that the main difference 
for carriers and noncarriers of the rs5491 SNP is in age trajectories of BMI and CH, and for the rs9925881 SNP, it is the average 
age trajectories of systolic and diastolic blood pressure. 


Downloaded by [Library Services City University London] at 04:27 06 July 2016 


12 A. I. YASHIN ET AL. 


females and males, 95% Cl 


survival function 


ages 


Number of SNPs <=2 --- Number of SNPs > 2 
— Genotyped individuals 


FIGURE 3. Survival Functions of Individuals with Different Dose of Vulnerability Alleles. Source: Framingham Heart Study (a limited access dataset), original 
cohort. 


3.6. Genetic Analyses of Longitudinal Data on Aging, Health, and Longevity Using the Genetic Version of the 
Stochastic Process Model, GenSPM 

The empirical evaluation of age trajectories of physiological indices for different groups of study subjects allows for capturing 
and analyses of differences in age trajectories of these indices among the groups. Such analyses, however, do not allow for 
studying behavior of hidden biomarkers of aging-related changes that are involved in mechanisms regulating age trajectories of 
physiological indices measured in longitudinal studies. Many such components were discussed in gerontological literature but 
were never analyzed together as a part of one mechanism of aging-related changes. They include the notions of age-specific 
physiological norms (Lewington et al. 2002; Palatini 1999; Westin and Heath 2005), allostasis and allostatic load (Karlamangla 
et al. 2006; Seeman et al. 2001), the adaptive capacity (Lund et al. 2002; Troncale 1996), stress resistance with age (Hall et al. 2000; 
Ukraintseva and Yashin 2003; Yashin et al. 2007a), and stochasticity, for example, stochasticity associated with erratic behavior 
of physiological parameters (Goldberger et al. 2002). To investigate mechanisms of aging-related changes with hidden and partly 
observed components the dynamic model describing such changes and their connection with health and survival outcomes is 
needed. The versions of such model were investigated in several papers (Yashin et al. 201 1a,c; Yashin et al. 2007a; Yashin et al. 
2008a,b; Yashin et al. 2012c,d). 

The key part of such models is mortality risk considered as function of physiological variables. Many epidemiological studies 
provide evidence of U- or J-shaped risks as functions of different physiological characteristics of health (Allison et al. 1997; 
Boutitie et al. 2002; Kulminski et al. 2008; Kuzuya et al. 2008; Mazza et al. 2007; Okumiya et al. 1999; Protogerou et al. 2007; 
Troiano et al. 1996; Witteman et al. 1994). Therefore, the use of such quadratic (U- or J-shaped) hazards in the analyses is 
biologically meaningful. An important class of models for joint analyses of longitudinal and time-to-event data uses a stochastic 
process for description of longitudinal measurements and a quadratic hazard as a function of physiological variables. An initial 
version of such models was put forth in a 2007 report (Yashin et al. 2007a). The model’s various extensions have been formulated 
and applied in different contexts to investigate mechanisms of aging-related changes in connection with morbidity or mortality 
risks (Akushevich et al. 2012; Arbeev et al. 2009; Arbeev et al. 2011; Arbeev et al. 2012; Tolley 2012; Yashin et al. 201 1a,b,c; 
Yashin et al. 2008; Yashin et al. 2012c,d; Yashin et al. 2010a; Yashin et al. 2013a; Yashin et al. 2009). The advantage of this 
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FIGURE 4. Average Age Trajectories of Physiological Variables for Genotyped Males and Females. Source: Framingham Heart Study (a limited access dataset), 
original cohort; pooled data from exams | to 28. 


approach is that it allows for incorporating the concepts and mechanisms of aging-related changes mentioned above on the basis 
of the common framework provided by this model. 

The version of SPM for analyses of genetic data (GenSPM) was developed in a 2009 effort (Arbeev et al. 2009). This GenSPM 
permits the following: 


(1) Joint analyses of genotyped and nongenotyped subsamples of longitudinal data to make use of all available information and 
to increase the accuracy and/or power of estimates compared to analyses of genotyped subsample alone 

(2) Evaluation of indirect genetic effects, for example, associated with unobservable or unmeasured risk factors, mediated by age 
trajectories of physiological variables collected in a longitudinal study and 

(3) Incorporation of concepts and mechanisms of systems biology underlying aging-related changes in organisms that are not 
directly measured in longitudinal data but can be estimated from individual age trajectories of physiological variables and 
time-to-event data. 


Specifically, this model permits evaluation of hidden (unobserved) biomarkers driving individual physiological change and 
affecting population characteristics. These include the aforementioned concepts of age-specific physiological norms, the allostatic 
load, homeostenosis, the decline in stress resistance with age, short-scale stochasticity, and respective hazard rates, for carriers 
and noncarriers of a selected allele or genotype. This model can be straightforwardly extended to incorporate “static” covariates 
to evaluate their modulating role in life course genetic effects. 
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FIGURE 5. Physiological Variables in Male Carriers/Noncarriers of Minor Allele of rs5491. Source: Framingham Heart Study (a limited access dataset), original 
cohort. 


3.6.1. Model Description 

The GenSPM (Arbeev et al. 2009) was applied to the analyses of longitudinal data on serum cholesterol and diastolic blood 
pressure for APOE and non-APOE subsamples of the original FHS cohort. The details of model construction and the likelihood 
maximization procedure are described in the report (Arbeev et al. 2009). Here we give only a brief description of the model to 
help understand the research results. 

The evolution of physiological variables Y, over age t is described by the stochastic differential equation 


dY, =a(t, GY; — filt, G)) dt + Bit, G)dW, (1) 


with the initial condition Y;,N(fi(to, G), cog). Here G (G = 0, 1;P(G = 1) = py) is a discrete random variable characterizing 
the absence (G = 1) or presence (G = 0) of the APOE e4 allele in a person’s genome, and W, is a Wiener process independent of 
Y,,and G. The coefficient B(t, G)was considered constant (B(t, G) = o1gG) in these applications. 

The effect of allostatic adaptation f)(t, G)(Arbeev et al. 2009; Yashin et al. 2007a) is described as quadratic function of ft: 
fit, G) = aF + bet + cf t?. This choice comes from the empirical observations of the average trajectories of the physiological 
variables in the FHS, which generally have a quadratic form, although, of course, these average trajectories do not necessary have 
to follow f\(t, G). 

The negative feedback coefficient a (t, G) is characterized by strength of homeostatic forces. The decline in the absolute value 
of this coefficient with age represents the decline in the adaptive (homeostatic) capacity with age (“homeostenosis”), which has 
been shown to be an important characteristic of aging (Hall et al. 2000; Lund et al. 2002; Rankin and Kushner 2009; Troncale 
1996). We used a linear approximation of this coefficient as a function of age: a(t, G) = ae + byt (with Ge < Oand be > 0). 
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FIGURE 6. Physiological Variables in Male Carriers/Noncarriers of Minor Allele of rs9925881. Source: Framingham Heart Study (a limited access dataset), 
original cohort. 


The U or J shapes of the mortality and morbidity risks as functions of various physiological variables and other risk factors were 
confirmed in a number of studies (Allison et al. 1997; Boutitie et al. 2002; Kuzuya et al. 2008; Okumiya et al. 1999; Protogerou 
et al. 2007; van Uffelen et al. 2010). This indicates that a quadratic function can capture dependence of the risk on deviations of 
trajectories of a physiological variable Y;, from its “optimal” values (Arbeev et al. 2009; Arbeev et al. 2011; Yashin et al. 2010a; 
Yashin et al. 2007a; Yashin et al. 2008a,b; Yashin et al. 2009). Such function has been used to describe mortality rate conditional 
on Y, and G: 


w(t, Yr, G) = wolt, G) + (Yi — folt, G)) uit, G). (2) 


Here /19(t, G) is the baseline hazard, and fo(t, G) are “optimal” trajectories (“physiological norms’’). We used the gamma-Gompertz 
(logistic) baseline hazards puo(t, G): po(t, G) = h(t, GU + oF, fy uu, G) du) where uh (t, G) = a + e’%:'(Vaupel et al. 
1998). 

The coefficient j11(t, G) characterizes stress resistance. Its increase with age corresponds to the decline in stress resistance 
because it narrows U shape of the risk, that is, making an organism more vulnerable to deviations from the “optimal” values, which 
can be considered as a manifestation of the senescence process (Robb et al. 2009; Semenchenko et al. 2004). In our analyses, 
L(t, G) was approximated by a linear function of age: w4(t, G) = a + bet 

The average age trajectories of respective physiological variables in long-lived (life span >90 for females, >85 for males) 
carriers and noncarriers of the APOE e4 allele were considered as “optimal” trajectories in the model. 

The model specification allows for testing the hypotheses on the differences in aging-related characteristics, for example, 
adaptive capacity and mean allostatic trajectories, between carriers and noncarriers of the e4 allele, on the decline in adaptive 


Downloaded by [Library Services City University London] at 04:27 06 July 2016 


16 A. I. YASHIN ET AL. 


TABLE 5 
Estimates of Parameters of the Genetic Stochastic Process Model 
Multiplier in 
Baseline hazard quadr. part of Adaptive Mean allostatic trajectory 
(L0(t,G)) hazard (u;(t4G)) capacity (a(¢,G)) (f1(4G)) Other parameters 
Vari-able Allele In ag. be. oy at be ay by ay b§ Cy Of OF pi 
CH e4 —5.05* 0.042 0.01 —0.0079 0.0021 —0.165* 1.9937 258.771 1.016  —0.0556 51.11 24.18 0.302 
Noe4 —5.65 0.052 0.00 —0.0105 0.0026 —0.072 1.161 223.51 1.141 —0.0492 38.34 14.11 
DBP e4 —5.41' 0.067 0.00 —0.1297* 0.0371 —0.153 0.000 94.85' —0.073 —0.0107 13.92 6.97 0.300 


Noe4 —6.32 0.082 0.00 —0.1783' 0.0018 —0.150 0.000 80.52 0.199 —0.0115 9.16 5.07 


Source: Framingham Heart Study (a limited access dataset), original cohort. 

Note: Estimates of parameters of the genetic stochastic process model applied to data on mortality and longitudinal measurements of total 
cholesterol and diastolic blood pressure in female and male carriers (e4) and noncarriers (no e4) of the APOE e4 allele. The estimates of some 
parameters are rescaled for better visibility in the table: ai; are multiplied by 10*; are multiplied by 10°; b¢ are multiplied by 10°. The agers 
after the numbers in the following column denote p values (evaluated by the likelihood ratio test) for different null hypotheses: Column “In ay #3 
baseline hazard rates coincide in carriers and noncarriers of the e4 allele, 1.e., zo(t, no e4) = o(t, e4)(respective symbols are shown in fours 116 
e4). Column “a® “: zero quadratic part of the hazard (separately for carriers and noncarriers), i.e., 41(t, no e4) = 0 for rows no e4, wi(t, e4) = 0 


any 
for rows e4. Column “iy . age-independent U shapes of the hazard (separately for carriers and noncarriers), i.e., bi, = 0 for rows no e4, b° =0 


for rows e4. Column “ay “: adaptive capacities coincide in carriers and noncarriers, i.e., a (ft, no e4) = a (t, e4) (especie symbols are shoei in 
rows no e4). Column “b¢*: no aging-related decline in the adaptive capacity (separately for carriers and noncarriers), bl = 0 for rows no e4, 

for rows e4. Column “a ¢ “: “mean allostatic trajectories” coincide in carriers and noncarriers, i.e., f(t, noe4) = f,(t, e4) (respective symbols 
are shown in rows no ed), The symbols in these columns signify +: p < 0.0001; §: 0.0001 < p < 0.001; #: 0.001 < p <0.01; *: 0.01 < p <0.05, 
for respective null hypotheses. The absence of symbols after the numbers in these columns means that respective p values exceed 0.05. Note that 
all other columns in the table, except the columns mentioned above, are not used to represent information on testing any null hypotheses, and 


therefore they do not contain any symbols. 


capacity with age, etc., using the likelihood ratio test. The likelihood optimization and the statistical tests have been performed 
using the optimization and statistical toolboxes in MATLAB. 

Equations (1) and (2) are simplified versions of a more general model in which coefficients depend not only on genetic factor G 
but also on other factors measured in the study. For example, let us assume that the data on exercises are available for each study 
participant, and one wants to estimate effects of exercise on age trajectories of physiological indices and mortality risk. In this case 
in addition to genetic factor G the coefficients in Equations (1) and (2) should include dependence on exercise variable Z. In the 
simplest case this is a “0-1” variable, where “0” mean no exercises and “1” means that person does exercises. The exercises status 
has a potential to change the dynamics of physiological state through Equation (1) with Z-dependent coefficients, for example, 
fit, G, Z) and a(t, G, Z). It may also influence mortality risk in Equation (2) through Z-dependent coefficients fo(t, G, Z) and 
Q(t, G, Z) of this equation, as well as through changes in physiological state, Y,. If the sample size of the data is large enough 
than stratification of study participants on subgroups with Z = 0 and Z = 1, and estimating model parameters for each subgroup 
will provide information on the effect of exercises on coefficients of these equations and age trajectories of people who do and do 
not do exercises. 

The model can be used for clarifying the mechanisms of alternative modes of genes action discussed in Section 3.1. Indeed, 
the quadratic hazard (2) can be used to describe risk of disease (e.g., CVD) onset or mortality by cause. The observed risk 
at, G) = E(u(t, Y;,G)|G, T > tf) (here T is age at disease onset) can be evaluated from the analyses of longitudinal data 
(Arbeev et al. 2009). The age trajectories of disease risks ju(t, G) specified for carriers and noncarriers of different genetic 
alleles may intersect at some age point or coincide after some point. Such intersection indicates that the property of allele to be 
disadvantageous may take place only at some age interval. It can become advantageous or neutral at the other interval. These 
properties can also be represented in age patterns of corresponding probabilities of staying free of selected disease (or survival 
functions) evaluated for carriers and noncarriers of corresponding genotypes. 


3.6.2. Results of Genetic Analyses Using GenSPM 

Table 5 shows estimates of parameters of the baseline hazard (j1o(t, G)), the multiplier in the quadratic part of the hazard 
(441(t, G)), the adaptive capacity a (t, G), the mean allostatic trajectory (/\(t, G)), and other parameters of the GenSPM applied to 
the data on mortality and longitudinal measurements of CH and DBP from the original FHS cohort. The table also shows the results 
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FIGURE 7. Application of Genetic Stochastic Process Model to Total Cholesterol. Source: Framingham Heart Study (a limited access dataset), original cohort. 


of testing null hypotheses about coincidence of various components of the model, such as adaptive capacity and mean allostatic 
trajectory, in carriers and noncarriers of the APOE e4 allele and other hypotheses on dynamic characteristics of the components 
of the model in the genetic groups (see note after the table). We use the APOE e4 allele here because its biological properties 
were largely discussed in the literature and statistical modeling adds new information about effects of this allele on age trajectories 
of hidden biomarkers of aging. Figures 7 and 8 show estimated components of the model such as the logarithm of the baseline 
hazard, the multiplier in the quadratic part of the hazard, the adaptive capacity coefficient, and the mean allostatic trajectory for 
carriers and noncarriers of the APOE e4 allele evaluated from data on CH and DBP for males and females combined. & 

One can see from Table 5 that the null hypotheses on the equality of baseline hazard rates in carriers and noncarriers of the e4 
allele (column In ay) are rejected for both physiological variables. Figures 7 and 8, top left panels, illustrate the patterns of the 
logarithm of baseline hazard rates estimated for both physiological variables and both sexes. They show that noncarriers of the e4 
allele have lower baseline rates at younger ages, that is, smaller In ee but they increase faster, that is, they have larger than the 
rates for carriers of the e4 allele, resulting in the intersection of the rates at the oldest ages (around 100 years). This observation 
is in line with the findings in the literature that the effect of the e4 allele on survival diminishes with age (Ewbank 2002) and that 
there is a lack of association of APOE alleles with survival of centenarians (Louhija et al. 2001). The table also shows that all 
parameters differ among carriers and noncarriers of the e4; however, some differences are not statistically significant. 

The null hypotheses on the zero quadratic part of the hazard (column av, in Table 5 are rejected in all cases for DBP but not for 
CH). This suggests that deviations of DBP from the “optimal” trajectories result in a more substantial increase in the risk of death 
than in the case of CH. Figure 7, top right panel, shows the tendency of increasing in 4;(t, G) for CH; however, this increase is 
not statistically significant. Figure 8, top right panel, shows faster increases in jz; (t, G) for e4 carriers in the case of DBP. 

This corresponds to the narrowing of the U shape of corresponding mortality risk (as a function of DBP) with age. Hence the 
“price” for the same magnitude of deviation from “optimal” values of DBP (in terms of an absolute increase in the mortality risk 
compared to the baseline level at that age) becomes higher for carriers than for noncarriers at older ages. This can be considered 
as a manifestation of the decline in resistance to stresses with age (Arbeev et al. 2011; Yashin et al. 2007a), which is an important 
characteristic of the aging process (Robb et al. 2009; Semenchenko et al. 2004), contributing to the development of aging-related 
diseases and death. Note that e4 noncarriers have a narrower U shape for ages up to 85. It is important to note that our approach 
allows for indirect evaluation of this characteristic for carriers and noncarriers of the e4 allele in the absence of specific information 
on external disturbances (stresses) affecting individuals during their life course (such data are not available in the FHS). 
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FIGURE 8. Application of Genetic Stochastic Process Model to Diastolic Blood Pressure. Source: Framingham Heart Study (a limited access dataset), original 
cohort. 


The analyses also revealed different age dynamics of the adaptive capacity in carriers and noncarriers of the e4 allele for 
different physiological variables. The null hypotheses on the equality of the adaptive capacity in carriers and noncarriers (column 
a¥ in Table 5) are rejected in case of CH. Figures 7 and 8, bottom left panels, show that carriers of the e4 allele have better 
adaptive capacities than noncarriers of this allele. The age dynamics of the adaptive capacity are different for CH and DBP. These 
observations indicate that the mechanisms underlying the decline in the adaptive capacity in carriers and noncarriers of e4 may 
not work universally for all physiological variables. The decline in adaptive capacity is an important feature of aging (Hall et al. 
2000; Lund et al. 2002; Rankin and Kushner 2009; Troncale 1996) that may contribute to development of aging-related diseases 
and death. However, direct measurements of the adaptive capacity are typically lacking in available longitudinal studies of aging, 
health, and longevity. The use of the feedback coefficient in the equation for the age dynamics of a physiological variable in our 
model allows us to indirectly evaluate this from the data because the absolute value of this feedback coefficient characterizes the 
adaptive capacity (Arbeev et al. 2009; Arbeev et al. 2011; Yashin et al. 2007a; Yashin et al. 2012c,d). 

The null hypothesis on the equality of the mean allostatic trajectories in carriers and noncarriers (column af in Table 5) is 
rejected for both CH and DBP. This indicates that the processes regulating the age dynamics of physiological variables in carriers 
and noncarriers of the e4 allele force their age trajectories to follow different curves (which also do not coincide with the “optimal” 
trajectories). Figures 7 and 8, bottom right panels, show that age trajectories of both CH and DBP in e4 carriers are forced to larger 
values compared to noncarriers of the allele, although the difference between carriers and noncarriers diminishes at the oldest ages. 
Similar analyses could be performed using data on carriers and noncarriers of each of 24 genetic variants described in Section 3.4, 
or on carriers and noncarriers of combinations of these variants. The results of such analyses will not be shown here. 

The properties of the APOE alleles and their influence on aging- and health-related traits are discussed widely. The properties 
of newly detected 24 genetic variants and related genes are much less known. In the next section we will summarize available 
information about these genes and discuss their possible roles in regulation of aging and longevity traits. 


3.7. Genes Linked with Detected Genetic Variants: Essential Findings about 24 SNPs 

The detected genetic variants are linked with genes whose expressions are crucial for maintaining organism’s functioning. 
Detected variants individually and jointly are associated with survival. The “genetic dose” index has a strong and significant effect 
on life span in the presence and in the absence of observed covariates. As with any other genetic variant detected in the GWAS, 
our variants do not necessarily have negative effects on life span in any individual. Difference in personal genetic background or 
in the exposure to external conditions may influence the effect sign and size. These differences also explain why changes in the 
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sample call rates result in different sets of selected genetic variants. High levels of the sample call rates reduce the number of 
individuals eligible for GWAS. In the case of relatively small frequencies of corresponding SNP alleles the estimated association 
becomes sensitive to the balance between positive and negative effects of the variant in the population of study participants. 

A typical requirement in GWAS studies is that the findings has to be confirmed by other GWA studies. However, the genetic 
associations with longevity traits depend to a great extent on the environment, which activates some genes and suppresses others. 
So the populations exposed to different environments are likely to show involvement of different genetic variants in life span 
(Yashin et al. 2012b). The chances that detected associations are random are likely to be small: All effects are genome-wide 
significant, negative, and show associations with life span for males and females in separate analyses. 

The negative contribution of genetic factors to life span at late ages is in line with predictions of the two evolutionary theories 
of aging: mutation accumulation (MA) and antagonistic pleiotropy (AP). Some findings including evolutionary conserved genes 
and genetic pathways involved in regulation of aging are difficult to explain using MA hypothesis of aging. Such conservation 
would be hard to make if mutations would happen randomly and are kept in the genomes during evolutionary time in various 
species (Mitteldorf 2012). However, living organisms might develop universal nonspecific mechanisms coping with aging to 
guarantee reproduction in the presence of various unpredictable genetic disturbances. Then this machinery is used to cope for 
postreproductive survival to compensate consequences of bad mutations with late manifestation. More studies are needed to clarify 
this problem. 

To understand biological relevance of the 24 “vulnerability” SNPs to aging and common diseases we applied annotation 
programs to these SNPs and reviewed current evidence about their possible biological effects and functions of closest genes or 
genes linked to these SNPs using established online resources and tools for functional annotation and biological interpretation 
of genetic polymorphisms such as the NCBI Entrez Gene and dbSNP bases, OMIM, GO, Ensembl, GeneCards, GeneTrail, and 
Panther, as well as relevant recent publications. Results are summarized in Table 6. 

The review of current knowledge indicates that the majority of genes related to 24 “vulnerability” SNPs could be involved in 
cell adhesion, apoptosis, viral infections (e.g., human cytomegalovirus HCMY), cancer, and age-associated brain disorders. In our 
earlier study we found 27 pro-survival alleles (Yashin et al. 2012b) of SNPs whose closest genes were predominantly involved in 
similar disorders and cell responses, though genes differed from those closest to 24 SNPs in this study. 

We also found that, from the list of 19 genes linked to the 24 SNPs, at least three—ARF1, ARFGAP1, CORO7—were 
involved in Golgi vesicular transport and membrane. The Golgi apparatus is a membranous structure in cell that processes and 
packs proteins made by the endoplasmic reticulum, before sending them out to their destination (in particular for secretion) 
using the secretory vesicle. In the process of functioning, cells synthesize a large number of different macromolecules. The 
Golgi apparatus plays an important role in preparing these macromolecules for cell secretion or use within the cell. It modifies 
proteins delivered from theendoplasmicreticulum and is also involved in the lipids‘ transport around the cell and the creation of 
lysosomes. 

From the genes linked to our 24 SNPs and involved in the Golgi processing, two (ARFGAP! and ARF1) are biologically 
closely interacting: The ARFGAP1 promotes hydrolysis of ARF1-bound GTP, which is required for the fusion of protein vesicles 
with Golgi compartments. The fact that our analysis identified SNPs in functionally closely connected genes, which are located on 
different chromosomes, is in support of the real association between cellular processes regulating the protein traffic in the Golgi 
apparatus and survival at oldest old ages. 

A limited number of studies suggest potential mechanisms of this intriguing connection. Cho and colleagues reported that the 
structure of the Golgi complex is significantly altered in senescent cells, and this can disturb normal protein secretion by such cells 
(Cho et al. 2011). The disturbed secretion may in turn partly explain often excessive and unbalanced release of pro-inflammatory 
factors by the senescent cells (Campisi et al. 2011), which in turn may potentially contribute to both physiological aging associated 
changes and pathology. The Golgi network was also linked to mTOR signaling, and aberrant Golgi trafficking was implicated in 
metastatic cancers (particularly in prostate cancer and melanoma) (Abraham 2009; Millarte and Farhan 2012; Sanchez-Laorden 
et al. 2009). 

For additional information, we also tested our list of genes for over- or under-representation of gene ontology (GO) terms 
related to particular biological processes, using GeneTrail online software for Gene Set Enrichment/Overrepresentation analysis 
(Backes et al. 2007). We compared our list of genes (corresponding to 19 of 24 SNPs that were located in genes, or linked to 
genes) with the reference set of about 14,500 genes related to ~ 156,000 SNPs located within genes and belonging to Affymetrix 
500,000 SNP array (~437,000 after quality control). The GeneTrail detected overrepresentation of genes related to the Golgi 
apparatus and membrane (4 observed vs 0.6 expected) in our list, with p value 0.03. There was also specific overrepresentation 
of genes involved in the Golgi transport vesicle coating: two observed genes (ARFGAP1 and ARF1) versus 0.013 expected, with 
p value 0.007. The false positive detection rate method was used for the multiple testing adjustment in this software. This analysis 
strongly supports results of our review of gene functions emphasizing the role of the Golgi apparatus in normal cell functioning 
and organism’s chances to survive the oldest old age. 


Downloaded by [Library Services City University London] at 04:27 06 July 2016 


20 


A. I. YASHIN ET AL. 


TABLE 6 


Characteristics of 24 SNPs 


Associated biological processes 


SNP name Chr In/out gene Closest gene Gene/protein function and health disorders 
1s3738682 1q42 Intronic ARF-1: ADP- — A small GTP-binding protein having a Golgi vesicles; cell proliferation; 
ribosylation central role in intra-Golgi vesicular cancer. ARF-1 is highly 
factor | protein transport. Modulates vesicle expressed and activated in 
budding and uncoating. N.B. The several breast cancer cell lines 
hydrolysis of ARF1-bound GTP is and was associated with 
mediated by ARFGAPs. Also a migration and invasiveness 
member of the RAS superfamily, the (Boulay et al. 2011). 
downstream target of PI3K (Nishida 
et al. 2011). 
rs356430 = 5q31.2 Probably CTB-35F21.1 ©CTB-35FF21.1 region may contain LincRNA works in complexes 
LincRNA_  (LincRNA) LincRNA. with proteins and performs 
regulatory functions such as 
inhibiting transcription and 
translation (Yoon et al. 2013). 
rs17067605 5q34 Intergenic 
1s2353447 =8ql11.1 Intergenic RPI11-783P22.2 
1s4565533 9q34.3 Intergenic Near RXRA N.B. No LD (7° > 0.5) found between 
184565533 and SNPs in RXRA 
(retinoid X receptor, alpha). 
rs7894051 10q26.2 — Intronic ECHS1: enoyl ECHS1 catalyzes the hydration of Mitochondrial oxidation has 
CoA hydratase, —_ 2-trans-enoyl-coenzyme A (CoA). been implicated in cancer 
short chain, 1, (metastatic melanoma) (Lake 
mitochondrial et al. 2011); downregulation 
potentiates apoptosis (Liu et al. 
2010) 
rs1440483 = 11q25 Intronic B3GATI: beta- B3GAT1 is involved in biosynthesis of Associated with schizophrenia 
1,3-glucurony the carbohydrate epitope HNK-1 (Kahler et al. 2011); CD57 
Itransferase | (human natural killer-1, also known as__ expression was linked to 
CD57), which is present on anumber HCMV and better survival 
of neural cell adhesion molecules from cancer (Hendricks et al. 
(Jeffries et al. 2003). See also KLRDI 2014; Nielsen et al. 2013). 
below. 
rs1794108 I1p15.5  Exon- PSMD13: PSMD13 acts as a regulatory subunit of Aging, longevity, cancer. 
nonsyn proteasome 26S __ the 26S proteasome involved in the PSMD13-SIRT3 haplotype 
coding subunit, ATP-dependent degradation of pools were significantly 
non-ATPase, 13. _—_ubiquitinated proteins. There is high different between centenarians 
LD between PSMD13 and SIRT3. and younger people (Bellizzi 
Sirtuins (SIRT1-7) play acentral role et al. 2007). SIRT3 has a role 
in epigenetic gene silencing, DNA as a tumor promoter or tumor 
repair, cell cycle, microtubule suppressor, depending on 
organization, and aging (Giblin et al. context (Alhazzazi et al. 2011). 
2014). 
rs5743998 =Lipl5.5 — Intronic TOLLIP: toll TOLLIP regulates inflammatory Downregulated in the aging brain 
interacting signaling and is involved in (Cribbs et al. 2012) and in 
protein interleukin-1 receptor trafficking and _ gastric cancers, in response to 


in the turnover of IL1R-associated H.Pilori infection 
kinase. Inhibits cell activation by (Pimentel-Nunes et al. 2013). 
microbial products. Inhibits IRAK1 
phosphorylation and kinase activity. 
(Continued on the next page) 
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Associated biological 
processes and health 
disorders 


rs9971555 1 1p13 


rs10845099 12p13 


131399453 12q23.1 
134904670 14q32.11 
139925881 16p12.2 
139928967 16p13.3 
rs5491 19p13.3 
182586484 1721.33 


Intronic 


Intergen 
(inked to 
KLRD1 
gene) 


Intronic 


Intronic 


Intergenic 


Exon- 
nonsyn 
coding 


Exon- 
nonsyn 
coding 


Intergen 
(linked to 
several 
genes) 


ABTB2: ankyrin 
repeat and BTB 
(POZ) domain 
containing 2 

Linked to KLRD1: 
killer cell lectin-like 
receptor subfamily 
D, member 1 


Located between GABARAPLI 
(GABA(A) receptor-associated 
protein like 1) and KLRD1. KLRD1 
(CD94) is expressed on NK cells and 
plays a role in recognition of HLA 
molecules by NK cells. 


ANO4: anoctamin 4 Member of a family of Ca2+-activated 
Cl- channels (Tian et al. 2012). 


NRDE2 NRDE2? is linked to near 

(C14orf102): gene—PSMC 1-proteasome (prosome, 
necessary for RNA — macropain) 26S subunit, ATPase, 1. 
interference, The latter may potentially interact 


domain containing with PSMD13 (host gene for above 

rs1794108). 

Between TRNAL7 N.B. No proxy SNPs (by LD) were 

and EEF2K found by SNAP near (+ 500K 
distance) of this SNP. Closest gene is 
EEF2K, a highly conserved protein 
kinase that links cell surface 
receptors to cell division. 

CORO?7: coronin 7; CORO7 plays a role in Golgi complex 

a.k.a. morphology and function. PAM16 is 

CORO7-PAM16 suspected to be involved in increased 

read-through rates of anaerobic metabolism, 
resistance to apoptosis and altered 
growth-factor sensitivity. 


ICAMI: A cell surface glycoprotein that is 
intercellular typically expressed on endothelial 
adhesion molecule cells and cells of the immune system. 
1; a.k.a. CD54 Mediates intracellular adhesion. 
Near COLIA1; The SNP is in perfect LD with 7 

in LD with surrounding SNPs located in or near 
FAM117A, a synteny block of genes with 
CACNA1G evolutionary conserved order, 


especially in FAM117A and 
CACNAIG. 


KLRD1 may be involved in 
immune response to viruses 
(Fang et al. 2011). N.B. 
CD94 and CD57 are both 
NK receptors and may both 
be involved in response to 
HCMYV infection 
(Hendricks et al. 2014). 

Other members of anoctamin 
family are thought to play 
role in cancer and apoptosis 
(Wanitchakool et al. 2014). 

N.B. One of our earlier 
identified 39 pro-survival 
SNPs (182282032) is in 
another intron of the same 
gene (Yashin et al. 2010c). 


The activity of EEF2K is 
increased in many cancers. 


Golgi; characteristic of 
cancer cells; apoptosis. 


Adhesion; in case of 
rhinovirus infection acts as 
a cellular receptor for the 
virus. Pleiotropic gene 
implicated in multiple 
health disorders: CVD, 
inflammation, AD, PD, 
schizophrenia, RA. 


(Continued on the next page) 
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TABLE 6 
Characteristics of 24 SNPs 


Associated biological 
processes and health 


SNP name Chr In/out gene Closest gene Gene/protein function disorders 
rs8081943 =17p11.2 _‘Intronic RAII: retinoic acid Located within the Smith-Magenis May be important for 
induced 1 syndrome region. May function embryonic and postnatal 
as a transcriptional regulator. development. May be 
Regulates transcription through involved in neuronal 
chromatin remodeling. differentiation (by 
similarity), neurological 
disorders. Pleiotropic 
effect: associated with both 
the severity of the 
phenotype and the response 
to medication in 
schizophrenic patients and 
with Parkinson’s disease 
rs9896996 =: 17p13.3 Intergen Between MIR In LD with rs62067977 (r? = 0.62; 
(inked to 212 and MIR132. D' = 0.79), which is intronic 
SMG6) In LD with SMG6 SNP of SMG6 gene. Its protein 
(smg-6 homolog, is part of the telomerase complex 
nonsense-mediated and binds single-stranded DNA 
mRNA decay factor). at the telomeres. SMG6 also 
participates in mRNA decay. 
rs11536959 20q11.23  Intronic LBP: lipopoly Binds to the lipid A moiety of Involved in the acute-phase 
saccharide binding bacterial lipopolysaccharides immunologic response to 
protein (LPS), a glycolipid present inthe Gram-negative bacterial 
outer membrane of all infections. Levels of LBP 
Gram-negative bacteria. The were increased 24 h after 
LBP/LPS complex seems to hip fracture surgery. 
interact with the CD14 receptor. 
rs6090342 20q13.33 — Intronic ARFGAPI: Promotes GTP hydrolysis on the | Overexpression induces the 
ADP-ribosy small G protein Arf-1 on Golgi redistribution of the entire 
lation factor membranes. Involved in Golgi complex to the 
GTPase-activating membrane trafficking and /or endoplasmic reticulum. 
protein | vesicle transport. Role in cell adhesion, 
migration, and cancer 
invasion (Sabe et al. 2006). 
182838566 = =21q22.3 Intergen Between LRRC3 and N.B. No proxy snps (in LD with 
TSPEAR 182838566) were found within + 
500,000 distance. 
rs6007952 = 22q13.3 Intronic GRAMD4: Mitochondrial effector of E2F1 Apoptosis 
GRAM domain (MIM 189971)-induced 


containing 4 apoptosis. Plays a role as a 
mediator of E2F1-induced 
apoptosis in the absence of 


TP53/p53. 


(Continued on the next page) 
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TABLE 6 
Characteristics of 24 SNPs 


Associated biological 
processes and health 


SNP name Chr In/out gene Closest gene Gene/protein function disorders 
rs8135777 =.22q13.3 ~——sdIntronic SHANK3: SH3 and Shank3 is a large scaffold SHANKS is crucial in 
multiple ankyrin postsynaptic density protein receptor tyrosine kinase 
repeat domains 3 implicated in dendritic signaling and mediates 
spinematuration and synapse sustained Erk-MAPK and 
formation, regulates the structural PI3K signaling. 
organization of neurotransmitter Information processing; 
receptors in postsynaptic dendritic associated with the autism; 
spines making it a key element in haploinsufficiency. 


chemical binding crucial to nerve 
cell communication. 


Source: When nonreferenced, the information about gene name and molecular function has been taken from established online sources, 
including NCBI (Entrez Gene, dbSNP, etc.), GO, Ensembl, GeneCards, OMIM, GeneTrail, PANTHER, and most recent literature. 
Note: Essential characteristics of 24 SNPs and their closest genes associated with lower survival at older ages (80+). 


SNP 1rs4904670 is located in intron of NRDE2 (a.k.a. C14orf102). One of the earlier found 39 pro-survival SNPs, rs2282032 
(Yashin et al. 2010c), is located in another intron of the same NRDE2 gene. NRDE2 codes “NRDE-2, necessary for RNA 
interference, domain containing protein” and is poorly studied so far. Some SNPs of this gene, including the earlier found 
rs2282032 are, however, in LD with neighbor gene-PSMC1-proteasome 26S subunit, ATPase, 1, which may potentially interact 
with PSMD13 gene coding another proteasome 26 subunit (Ewing et al. 2007). And the PSMD13 is host gene for rs1794108, 
another SNP of the identified 24 SNPs. That is, two of the 24 newly identified SNPs (rs4904670 and rs1794108) and one of 39 
SNPs from an earlier study (Yashin et al. 2010c) relate to two potentially interacting proteasome subunits. 

The rs1794108 is located in exon of PSMD13 gene coding proteasome 26 subunit and is respectively involved in protein 
degradation by the proteasome. PSMD13 is also in strong LD with neighbor SIRT3 gene, of the Silent information regulator 
2 (Sir2) family of histone deacetylases (sirtuin HDACs). Sirtuins (SIRT1-7) are mammalian homologues of the Sir2 gene in 
yeast and play a central role in epigenetic gene silencing, DNA repair and recombination, cell cycle, microtubule organization, 
and the regulation of aging (Mahlknecht and Voelter-Mahlknecht 2011). SIRT3 also has a role as a tumor promoter or tumor 
suppressor, depending on context (Alhazzazi et al. 2011). There are significantly different PSMD13-SIRT3 haplotype pools 
between centenarians and younger people (Bellizzi et al. 2007). Close relation of the identified SNP to the candidate aging and 
longevity genes supports its true association with life span. 

For the majority of evaluated SNPs, the minor allele displays a kind of trade-off effect on female (but not male) survival. That 
is, itis associated with worsened survival at oldest old ages, but with better survival at younger ages. 

Overall, after age 80, the minor (“frailty”) alleles may (1) increase cancer risk in females but be protective against cancer in 
males; (2) increase the risk of AD in females (for most SNPs: 17 of 24), and in males (for 7 of 24 SNPs) but be protective against 
AD in males (for about 50% of SNPs); and (3) increase risk of CHD in males (for most SNPs), and in females (for more than half 
of SNPs: 15 of 24). 

There are no centenarians among those carrying more than two “frailty” alleles, while almost 5% among those carrying two 
and fewer of such alleles. This indicates that a major condition of achieving extreme longevity could be absence of detrimental 
alleles. It may even be more important than presence of alleles providing particular benefits to its carriers. 

The negative effect of minor (“vulnerability”) alleles on survival after age 80 may in part be explained by higher risks of CHD, 
which may contribute to mortality after age 80 more substantially than cancer. Earlier decline in BMI among carriers of two 
or more of these alleles indicates possibility of accelerated aging or decline in stress resistance. An indirect indication is earlier 
decline in BMI in carriers of two or more alleles. 


4. DISCUSSION 

Better understanding of genetic mechanisms linking individual aging-related changes, declining health status, and increasing 
mortality risk have become a matter of high practical importance in gerontology and geriatrics. One reason for this is the need 
for improving health of the elderly in rapidly aging populations of developed part of the world. The idea of improving population 
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health and longevity by properly affecting individual aging-related processes stimulated studies of factors and mechanisms that 
link individual aging with observed variations in health span and life span (Butler et al. 2008; Butler et al. 2004; Goldman et al. 
2013). A substantial part of these research addressed questions about the roles of genes in aging-related changes and life span. 
Initially these studies dealt with analyzing association of candidate genes with aging and longevity traits (De Benedictis et al. 
2001; Finch and Tanzi 1997). Advances in genotyping technology provided researchers with a large amount of data on the genetic 
backgrounds of a large number of individuals for whom phenotypic longitudinal data were collected. This information stimulated 
numerous genome-wide association studies of complex traits with a hope of clarifying genetic mechanisms involved in their 
regulation. However, the expectations that the use of genome wide association studies will rapidly explain connections between 
genetic factors and the traits of interest have not been met. The results of many studies were controversial. The sets of genetic 
variants associated with aging and longevity traits depended on population under study. Most estimated associations of genetic 
variants with aging and longevity traits have not reached genome-wide levels of statistical significance. 

For many complex traits the genetic variants detected in GWAS of these traits explain only a small portion of genetic variability 
in these traits, indicating that many more influential genetic factors and regulatory mechanisms still have to be discovered (Eichler 
et al. 2010; Makowsky et al. 2011; Manolio et al. 2009; Marian 2012; Zuk et al. 2012). The use of GWAS involves testing 
associations for large numbers of genetic variants. To reduce the numbers of false positive findings, such studies use correction for 
multiple testing (Gao et al. 2010; Li et al. 2012; Nyholt 2004). As a result, the statistical conclusions from data analyses become very 
conservative because many genetic variants that showed associations with such traits at the nominal level of statistical significance 
have not reached a genome-wide level of significance. The research findings were difficult to explain from the evolutionary theory 
point of view. This situation indicates the need for developing new concepts and better methods for analyzing genetic data on such 
traits. 

The results of genetic analyses of aging- and longevity-related traits produced additional challenges for the researchers. Most 
genetic variants detected in earlier candidate genes studies of such traits failed to be replicated in association studies of the same 
traits in independent populations (Barzilai et al. 2012; Broer et al. 2015; Lunetta et al. 2007; Nebel et al. 2011; Newman et al. 
2010; Walter et al. 2011). The results of GWAS of human aging and life span demonstrated sensitivity to the type of statistical 
model used in the allele (genotype) selection procedure (Yashin et al. 2012b). This indicates that conclusions from such analyses 
have to be used with care, and that mechanisms of genetic influence on aging and life span require more accurate descriptions. In 
this article we showed that the results of analyses are also sensitive to the set of rules used in the QC procedures, which specify 
sets of genetic factors and fix the sample of study subjects appropriate for genetic analyses. The fact that different researchers 
performing such procedures often use different sets of rules casts doubts on the legitimacy of comparison of the results of different 
studies, as well as on the results of meta-analyses of the research findings in which different QC procedures have been used. 
Another problem in genetic studies of aging and longevity is that dealing with linkage disequilibrium (LD) and Hardy-Weinberg 
equilibrium (HWE) in these studies follows standards used in the analyses of other complex traits. However, this strategy is likely 
to be erroneous in studying aging and longevity-related traits because it ignores the fact that mortality selection process in a 
genetically heterogeneous population affects both LD and HWE parameters when population cohorts are getting older. 

In our recent analysis (Yashin et al. 2012b) we showed that sets of genetic variants selected in GWAS of human longevity 
depend on a statistical model describing the connection between longevity traits and genetic variants. To reduce the effect of the 
model on the selection results, we performed GWAS of life span using six such models and identified an overlapping set of 27 
SNPs showing an effect on life span across all six procedures. We found that the “longevity SNPs” were located in/near genes 
largely involved in cell proliferation and apoptosis/senescence pathways. We investigated the polygenic influence on life span 
and its possible biological mechanisms. We estimated joint effects of pro-longevity SNP alleles in the FHS 550K SNPs data on 
human survival. We found that 27 alleles each positively associated with life span show significant additive influence on life span. 
The majority of these SNPs (74%) were within genes, compared to 40% of SNPs in the original 550K set. The review of current 
literature showed that the respective genes are largely involved in aging, cancer, and brain disorders. An important finding was 
that pro-longevity genes identified in this study had functional relevance to aging and aging associated diseases, which supports 
causal relationship between these genes and life span. The fact that genes found in our and other genetic association studies of 
aging and longevity have similar functions indicates high chances of true positive associations for corresponding genetic variants 
(Yashin et al. 2012). 

In this article we also addressed issues about possible causes of low efficiency of genetic association studies of human aging 
and longevity traits. The results of performed analyses demonstrated that genetic variants may have pleiotropic associations with 
such traits. Some genes may positively influence survival at given age intervals. The effects of these genes may become neutral 
or change their influence to the opposite at other age intervals. The genetic effects are also likely to be modulated by other genes, 
as well as environmental factors and living conditions. In addition, aging and longevity traits can be genetically heterogeneous. 
This means that the same value of a trait can be reached under influence of different genes or sets of such genes. As a result, the 
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genome wide association studies, performed without taking these features into account, are likely to have weak genetic signals. 
This also explains why detected associations are difficult to replicate in studies of independent populations. 

Our analyses showed that aging and longevity traits have an important feature that distinguishes them from other complex 
multifactorial traits subjected to genetic analyses. This feature is manifested in the process of mortality selection that takes place in 
genetically heterogeneous populations. The fact that genes affecting aging and life span are involved in these processes indicates 
that methods currently used in GWAS of other complex traits might be inappropriate for studying genetics of aging and longevity. 
At the same time this feature opens new opportunities for genetic analyses of these traits using biodemographic concepts and 
methods. We showed how these new approaches can substantially improve the quality of genetic analyses. 

The genetic analyses of complex traits often deal with population of study participants that consists of subgroups of individuals 
having different ancestry. The possibility of such situation, known as “population stratification,” has to be taken into account in 
genetic analyses of complex traits. It is important to note that additional source of population stratification is the process of mortality 
selection in heterogeneous populations. This is the process in which the “longevity” or “vulnerability” alleles or genotypes can 
modify genetic structure in the populations of the old and oldest-old compared to the younger groups of individuals. This property 
indicates that controlling for possible population stratification, for example, due to the differences in ancestry (Ma and Amos 2010; 
Price et al. 2006; Yang et al. 2011), has to be done with care because it could substantially reduce association of genetic variants 
with aging and longevity traits. 

Note that many participants of the Framingham Heart Study are related individuals whose life spans and other health and 
longevity-related traits are dependent variables. In the pre-genomic era such dependences were taken into account using multivariate 
survival models (Yashin and Iachine 1999). The correlation among life spans of related individuals was described using the notions 
of hidden correlated (shared) frailties, that is, unobserved non-negative random variables that mediate genetic influence on life span. 
The genetics of life span and other longevity-related traits, including hidden frailty, was investigated by calculating narrow-sense 
heritability of these traits (Yashin et al. 1999a,b; Yashin and Iachine 1999). The heritability estimates provide researchers with 
information about proportion of phenotypic variance that can be explained by the additive genetic factors in the additive models 
of phenotypic traits. Today, when a substantial part of information about genetic structure of study participants is not hidden but 
is available for analysis in the sets of hundreds of thousands and millions of genetic loci (e.g., SNPs), the heritability estimates 
indicate how successful the results of GWAS of corresponding traits are expected to be. The results of recent analyses of genetic 
data for a number of phenotypic traits detected a problem of “missing heritability”: The estimates of phenotypic variation induced 
by additive genetic factors calculated from available genetic data are lower than corresponding estimates of these characteristics 
obtained from genealogical or twins data (Clarke and Cooper 2010; Katsios and Roukos 2011; Makowsky et al. 2011; Manolio 
et al. 2009; Zuk et al. 2012). One should keep in mind, however, that heritability estimations are based on oversimplified models, 
so the results of such analyses for many important traits, analyzed today, have to be used with care. In many cases, the components 
of phenotypic variability of a trait explained by additive genetic variation can be calculated directly from the data. The dependence 
among family members influences the results of GWAS and has to be taken into account in the analyses of genetic data. This 
is done using specially developed software, for example, using mixed effect models (Kang et al. 2010) or shared frailty models 
realized in R (e.g., package coxme). 

An important advantage of using longitudinal data in genetic studies of aging is the opportunity to compare survival functions 
for carriers and noncarriers of selected genetic variants or groups of such variants. However, genes do not affect life span directly. 
Their effects on this trait are mediated by aging-related changes in health status, physiological variables, and other biomarkers. 
One more advantage of using such data is an opportunity to show how genetic variants affecting life span influence average age 
trajectories of physiological variables for carriers and noncarriers of these variants. 

Note that physiological variables represent observed components of biological mechanism involved in regulation and manifes- 
tation of the aging process. Some important components of this mechanism including variables describing resistance to specific 
stresses, adaptive capacities, age-dependent physiological norm, effects of allostatic adaptation, and allostatic load are hidden and 
not measured in most longitudinal studies. The importance of these biomarkers for understanding the process of aging has been 
emphasized in a number of experimental studies using animal model systems. We showed how these biomarkers can be incorpo- 
rated in the model of individual aging with observed and unobserved components, and how these components can be evaluated for 
individuals with different genetic backgrounds. The results of these analyses demonstrated benefits of using biodemographic prin- 
ciples and integrative statistical models of mortality risks in genetic studies of human aging and longevity. The application of the 
GenSPM revealed different patterns of regularities in aging-related characteristics (adaptive capacity, decline in stress resistance, 
mean allostatic trajectories, and the baseline hazard rate) in carriers and noncarriers of the APOE e4 allele. Such aging-related 
characteristics cannot be calculated directly from the longitudinal data because of the lack of respective measurements. 

The analyses confirmed that genetic influences on life span are realized through dynamic mechanisms regulating changes in 
physiological variables during the life course. The average aging-related changes in the eight selected physiological variables are 
likely to be driven by hidden components of aging changes and by genetic factors. The ability of advanced methods of statistical 
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modeling to estimate hidden components of aging changes in humans indicates that the approach can be further extended to 
perform more comprehensive analyses of available data by incorporating relevant biological knowledge about aging into statistical 
models. The use of such models in statistical analyses of data will help researchers untangle complex age-dependent dynamic 
relationships among biomarkers and elucidate roles of genes and nongenetic factors in aging, health, and life span. 

The results of genome-wide association studies of complex traits, such as life span or age at onset of chronic diseases, suggest 
that such traits are typically affected by a large number of small-effect alleles. Individually such alleles have little predictive values, 
therefore they were usually excluded from further analyses. An important finding of our study was that detected alleles may jointly 
influence life span so that the resulting influence can be both substantial and highly statistically significant. We show that this 
joint influence can be described by a relatively simple “genetic dose—phenotypic response” relationship. The estimated effect of 
polygenic score index on life span was substantial and highly statistically significant. 

The results of our analyses indicate that genetic studies of human aging and longevity traits will benefit from adjusting conceptual 
models used in GWAS to the inherent complexity of these traits. The new models have to take the multifactorial nature, genetic 
heterogeneity, and pleiotropic effects of genetic factors on these traits, as well as biodemographic aspects of the problem into 
account. The biodemographic aspects related to the aging and longevity traits make them different from other complex traits. The 
process of mortality selection in genetically heterogeneous populations, in which some genes influence mortality risks, produces 
changes in genetic structure of this population with increasing age. Such changes cannot be safely ignored in genetic association 
studies of these traits. Our results also show that exceptional longevity could be not only because of the presence of “longevity” 
alleles but also because of the absence of large numbers of “vulnerability” alleles. The existence of such vulnerability variants 
is supportive to the mutation accumulation and antagonistic pleiotropy hypothesis of aging. According to the FHS-calculated 
frequencies the detected “vulnerability” alleles are common genetic variants. However, according to their frequencies in HapMap 
and in 1000 genome projects they are rare variants that could partly be related to possible genotyping errors in the FHS data. This 
indicates that chances of replicating these results using data on independent populations might be small. The use of longitudinal 
data in genetic association studies opens additional opportunities for studying genetics of human aging and longevity. Using these 
data one can evaluate age trajectories of survival functions for carriers and noncarriers of respective alleles (genotypes) or average 
age trajectories of physiological variables. Using advanced methods of statistical modeling one can reveal genetic influence on 
age trajectories of hidden biomarkers of aging that play fundamental roles in aging-related changes but are not measured directly 
in longitudinal studies. 

The analyses of functional properties of genes associated with detected 24 SNPs indicate that all of them are involved in 
regulation of important biological processes relevant to aging and longevity traits. Integrating this information with that obtained 
in the analyses of longitudinal data on age trajectories of physiological indices and other biomarkers will substantially clarify 
biological mechanisms of aging and longevity and generate new insights into personalized approaches to maintaining high health 
standards and longevity in population cohorts. 

It is important to note that using advanced methods is extremely important for efficient analyses of available genetic data. 
However, the use of efficient statistical methods does not protect researchers from getting false-positive findings if the quality of 
genotyping is low. The polygenic score indices constructed from the set of false-positive genetic variants are likely to produce 
false positive joint effects if the effects of individual variants are caused by common confounders. This means that the quality of 
genotyping and the presence of possible confounders have to be carefully investigated. Since the quality of genotyping is getting 
better with advancing technology the genotyping errors are likely to be found in the data that were prepared for genetic analyses 
decades ago. This issue is particularly relevant for genetic data on members of the original FHS cohort used in our analyses. This 
means that biological interpretations of research findings described above have to be considered as preliminary. They have to be 
confirmed in the results of further genetic analyses of related traits in other datasets with better quality genetic data. The issues of 
the quality of genotyping and methods of their testing were partly discussed in Yashin et al. (2015). 
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